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SYSTEM AND METHOD FOR PROVIDING 
AND USING UNIVERSALLY ACCESSIBLE 
VOICE AND SPEECH DATA FILES 

This appln is a con't on Ser. No. 08/748,943 filed Nov. 
14, 1996. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates generally to the construction and 
use of distributed interactive voice and speech processing 
systems, including interactive voice response (I VR) systems 
and voice messaging (VM) systems. More particularly, the 
invention relates to form based publishing of voice infor- 
mation and the use of universally accessible personal pro- 
files for authentication of the user by voice signatures and 
generating context sensitive active vocabularies to improve 
speaker dependent speech recognition. The invention also 
relates to the use of the user attributes and preferences stored 
in universally accessible personal profiles to improve the 
efficiency of navigation and search as well as efficacy of 
search results pertaining to user queries. 

2. Description of the Related Art 

Conventional interactive voice response (IVR) systems 
allow a user to place a telephone call into a system, navigate 
(generally using touch tone input) through a hierarchy of 
options in response to voice prompts and retrieve informa- 
tion stored in a computer database. Airlines, banks, credit 
companies and many other service organizations are just a 
few examples of the types of businesses using IVR systems 
to allow a customer (or prospective customer) to retrieve 
desired information. These conventional systems are gener- 
ally organization-specific in that they offer access to a single 
database or set of databases related to the goods, services or 
other aspects of the organization maintaining the IVR sys- 
tem. Thus, conventional IVR technology is used to offer 
access to information specific to a single organization (i.e. a 
specific airline, bank or credit company). For example 
airlines typically use IVR to allow callers to access flight 
arrival and departure information or to select reservation 
options, for the particular airline only. 

It is desirable to provide an IVR system that enables 
access to an aggregation of databases and services rather 
than a single database and service. One barrier to the 
provision of aggregated services in an IVR system is that 
conventional IVR systems do not have a distributed infor- 
mation publishing means. Conventional IVR systems do not 
have a mechanism for service/information providers to 
readily access the IVR system and add updated or entirely 
new information for publication on the IVR system. 

Further, conventional IVR systems are generally config- 
ured for uniform access by any caller admitted to the IVR 
system. Each caller is handled by the system in the same 
manner and offered an identical set of options. One reason 
that IVR systems use uniform user interfaces for each caller 
rather than caller-specific configurations is that conventional 
IVR systems operate in "closed" computer environments 
hosting the particular IVR system. Thus, when a caller 
accesses a conventional IVR system, the only caller-specific 
information which the system has at its disposal, is any 
information previously provided by the caller which the 
system has maintained or any information that is provided 
by the caller during the IVR session (i.e. when a user enters 
an account number using touch tone telephone input). 
Because, however, collecting and storing caller-specific 
information with conventional technology is cumbersome 
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and time consuming, most IVR systems do not offer caller- 
specific (caller customized) features. 

There are numerous applications in which it is desirable 
for an IVR system to use caller-specific information in 

5 handling a call. Caller-specific information in the form of 
user preferences can aid in minimizing the size of a com- 
mand tree which the user must navigate to access desired 
information. Additionally, caller specific information could 
also be used to authenticate the identity of a user in cases 

10 where security is an issue (i.e. in bank and credit contexts). 
Further, caller-specific speech training profiles could be used 
to implement speaker dependent speech recognition to allow 
for a caller to use voice commands in place of touch-tone 
commands. Still further, an IVR system having access to 

*5 caller-specific data could be used to apply IVR technology 
in new application areas such as personal productivity. 

Thus, there is a need for an improved voice and speech 
processing system that provides universal access to caller- 
specific information to provide user-customized IVR sys- 

20 terns. Further, there is a need to provide universal access to 
voice and speech files in order to allow widespread use of 
such files for caller authentication and for performing 
speaker dependent speech recognition in IVR systems. 

25 SUMMARY OF THE INVENTION 

The system and method of the present invention extends 
World Wide Web (referred to herein as "www" or the "web") 
and Internet technology to provide universally accessible 

30 caller-specific profiles that are accessed by one or more IVR 
systems. The invention features a set of web pages contain- 
ing information (components) formatted using MIME and 
hypertext markup language (HTML) standards with exten- 
sions for voice information access and navigation. These 

35 web pages are linked using HTML hyper-links that are 
accessible to users via voice commands and touch- tone 
inputs. These web pages and components in them are 
addressable using HTML anchors and links embedding 
HTML universal (uniform) resource locators (URLs) ren- 

4Q dering them universally accessible over the Internet, This 
col lea ion of connected web pages arc referred to herein as 
the "voice web" and the individual pages are referred to 
herein as "voice web pages". Each web page in the voice 
web contains a specially tagged set of key words and touch 

45 tone sequences that are associated with embedded anchors 
and links used for navigation within the web. 

In addition, the invention features a set of linked HTML 
pages representing the user's "personal profile". The per- 
sonal profile contains user's attributes and preferences. 

50 Attributes include user's na me address, p hone number, 
personal identification code, voice imprints for 
authentication, speech training profile and other informa- 
tion. Preferences include, configuration preferences such as 
personal greetings and gender and language selection, selec- 

55 tion preferences such as bookmarks and favorite places and_ 
presentation preferences such as priority ordering, default 
overrides and preferred vocabulary. 

The personal profile is designed for component access 
within web pages allowing easy extraction of context scn- 

60 shive profile information. In particular, speech trainin g 
p rofiles (included as a user auributelmd wmch"contain word 
pattern s representing spea ker dependent train ing 
infor mation) partitioned into sets of related words lik* ely to 
occu r in combination within co rresponding voice web 

65 pages. A set of command and control words such as "play, 
pause, continue, previous, next, home, reload, help, etc." are 
stored in a top level component set enabling user dependent 
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but context independent navigation and control. Other com- 
ponent sets are designed to match the key word sets in 
corresponding voice web pages such as a calendar page or 
an address book page enabling user and context dependent 
navigation and control. 

When a user calls into the distributed voice and speech 
processing system associated with the voice web, the system 
first identifies the user utilizing a unique account number 
(such as phone number or social security number). Next, it 
accesses the user's personal profile using the corresponding 
URL and retrieves the user attributes and preferences related 
to authentication and security. Using this personal profile 
information, the voice web system authenticates the identity 
of the user using a combination of personal identification 
code based password checking and voice imprint matching. 
The voice imprint is any sufficiently long utterance or phrase 
that the user has previously entered into his/her profile. Each 
user's voice imprint is analyzed and stored in the profile for 
quick matching on demand with a real-time provided user 
sample. The combination of every individual's unique vocal 
characteristics stored in the voice imprint coupled with the 
random choice of the password phrase ensures a high degree 
of security and authentication. 

Once authenticated, the user is allowed to navigate and 
access more information from the voice web using voice 
commands. In order to effectively accomplish this task, the 
voice web system retrieves the context independent com- 
mand and control key word set from the user's speech 
profile. 

The voice web system then presents a top level voice web 
personal home page for user's perusal. At the same time, it 
retrieves the set of word recognition patterns associated with 
the key words in the presented page from the user's speech 
profile. Thus, the system is able to match the active vocabu- 
lary and associated speaker dependent word patterns 
dynamically in a context sensitive manner. The process 
continues as the user navigates from page to page. The voice 
web system dynamically retrieves the suitable subset of 
traini ng word pauerns from the user's speec h pro! lie match - 
ing the voicqj iavigation key words in The page being 
presenTe§jo the user. 

Tnepr ocess described abov e greatly reduces the size of 
the tr aining informatio n thai rifiefJS to be retnevetTat any 
time while significantly enhancing accuracy ot speech rec- 
ognitio n using speaker dependent training profiles. Since the 
speech profile is constructed using HTML pages and 
components, it is universally accessible using its URL. This 
enables the user to call into any compatible Internet con- 
nected voice web system in user's proximity from anywhere 
in the world, identify himself/herself to the system and then 
enable the system to dynamically retrieve suitable informa- 
tion that enhances his/her navigation and access of the 
information stored in the voice web using voice commands 
and input. 

In addition to the user attribute information discussed 
above, the personal profile contains user preferences relative 
to configuration, presentation and information selection. 
These preferences are components within the personal pro- 
file pages and are easily available to the voice web system 
for dynamic retrieval. For example, if the user requests 
his/her stock portfolio from the voice web, it first retrieves 
the user's preferred portfolio of companies from his/her 
profile and applies this list to limit the search on stock quotes 
from all companies. The user gels exactly the information 
relevant to his/her interest in exactly the order of priority 
he/she prefers. 



10 



15 



20 



25 



30 



35 



40 



45 



50 



60 



65 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a functional block diagram of a voice web 
system in accordance with the present invention. 

FIG. 2A is a functional block diagram of the voice web 
system shown in FIG. 1 configured to provide voice web 
services. 

FIG. 2B is a functional block diagram of an exemplary 
calendar service. 

FIG. 2C is a functional block diagram of an alternative 
configuration of a voice web system in accordance with the 
present invention. 

FIG. 3 illustrates personal voice web used to provide 
personal services using the system shown in FIG. 2A. 

FIG. 4 illustrates a hierarchy of speech training pages that 
correspond to the service pages shown in FIG. 3. 

FIG. 5 illustrates a hierarchy of attributes and preferences 
pages that correspond to the service pages shown in FIG. 3. 

FIG. 6 is a flow diagram of a subscriber authentication 
method used in the delivery of the personal voice web 
services shown in FIG. 3 

FIG. 7 is a flow diagram of an enhanced speech recog- 
nition processes used in personal voice web systems shown 
in FIG. 3. 

FIG. 8 is a flow diagram of a query customization process 
in accordance with the present invention. 

FIG. 9 is a flow diagram of a voice publishing method in 
accordance with the present invention. 

FIG. 10 is a system diagram of a business-yellow-order 
page system in accordance with the present invention. 

DESCRIPTION OF A PREFERRED 
EMBODIMENT 

The figures depict a preferred embodiment of the present 
invention for purposes of illustration only. One skilled in the 
art will readily recognize from the following discussion that 
alternative embodiments of the structures and methods illus- 
trated herein may be employed without departing from the 
principles of the invention described herein. 

System Description 

FIG. 1 is a functional block diagram of a voice web 
system 100 in accordance with the present invention. Voice 
web system 100 extends the conventional internet and world 
wide web ("web" or www) technology to voice and speech 
processing applications and also enables new uses for inter- 
active voice response (IVR) technology. Voice web system 
100 includes one or more voice web sites 102 coupled to one 
or more voice web gateways 105 via the Internet 101. Voice 
web sites 102 and voice web gateways 105 transfer files over 
Internet 101 in accordance with hypertext transport protocol 
(IITTP). A subscriber 107 accesses the voice web system 

100 by coupling to the gateway 105 using a telephone ill 
coupled to the public switched telephone network (PSTN) 
109. 

Internet 101 is a system of linked communications net- 
works that facilitate communication among computers 
which arc coupled to internet 101. Generally, internets such 
as Internet 101 facilitate communication by providing file 
transfer, electronic mail and news group services. Internet 

101 is preferably the Internet which evolved from the 
ARPANET and which is publicly accessible world wide. It 
should be understood however, that the principles of the 
present invention apply to other internets and even closed 
(private) networks such as corporate intranets. 
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It should be noted that system 100 may include numerous 
voice web sites 102 and numerous voice web gateways 105. 
A single voice web site 102 and a single voice web gateway 
105 are shown in FIG. 1, however, to keep the figure 
uncluttered. Thus, voice web system 100 is a collection of 
voice web gateways 105 and voice web sites 102 connected 
over internet 101 enabling subscribers 107 to access voice 
web pages 103 via their telephones as shown in FIG. 1. 

A voice web page 103 is web page specified using a 
navigable markup language that includes voice extensions. 
A navigable markup language is an enhanced type of 
markup language that facilitates publication navigation and 
access of information stored in documents specified in the 
navigable markup language. An exemplary markup lan- 
guage is the Hypertext Markup Language 2.0, RFC1866, 
HTML working group of Internet Engineering Task Force, 
Sep. 22, 1995, edited by D. Connolly published on the www 
at the following uniform resource locator (URL) address: 
http://w3.org/pub/www/Markup/html-spec. 

A markup language is a language that includes a set of 
conventions for marking portions of a document so that, 
when accessed by a parsing program such as a web browser, 
each marked portion is presented to a user with a distinctive 
format. In contrast to formatting codes used by word pro- 
cessing programs, markup language codes, called tags, do 
not specify exactly bow the tagged portion should be pre- 
sented. Instead the tags inform the web browser (parser) that 
the information is in a certain portion of a document such as 
title, heading, form or text and the like. The web browser 
(parser) determines how to present the tagged information. 

A navigable markup language is an enhanced markup 
language that uses tags that are anchors and that are links. 
When these link and anchor tags are invoked, a user is then 
presented another navigable markup language document in 
accordance with the link and anchor tags. This link is 
sometimes called a hyperlink. A hyperlink is a reference to 
another markup language document which when invoked 
facilitates access of the referenced markup language docu- 
ment. 

A navigable markup language thus uses attributes, tags 
and values that enable (i) a publisher to specify the presen- 
tation of information to a user; (ii) a user to interactively 
access the stored information; and (iii) a user to access other 
navigable markup language documents using hyperlinks. 

The navigable markup language used to specify voice 
web pages 103 is Hyper Voice Markup Language (HVML). 
H VML is a version of HTML that includes voice extensions 
as described in Appendix A, incorporated herein by refer- 
ence. Voice web pages 103 include HVML tags and 
attributes that extend HTML to facilitate publication, navi- 
gation and access to voice information. For example, HVML 
specifies functions and protocols that facilitate voice and 
speech processing including voice authentication, speaker 
dependent speech recognition, voice information publishing 
(e.g. creating a voice form) and voice navigation. 

Just as conventional web documents are displayed for the 
user, voice web documents 103 are "played" to a subscriber 
over a telephone. A voice web page 103 is played (by voice 
web browser 106) by sequentially presenting the embedded 
voice components according to the HVML and MIME 
specifications. 

While a conventional web site enables on-demand access 
over an internet to conventional web pages, voice web site 

102 enables on demand access to voice web pages 103. 
Voice web site 102 is a computer that hosts voice web pages 

103 and serves them up to other computers (i.e. voice web 
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gateway 105). More specifically, voice web server 102 is a 
computer configured with conventional web server software 
112 and which has access to stored voice web pages 103_, A 
voice web site 104 additionally optionally includes a sub - 

5 sc ripeTdirectory 104 "thai stores a list ot re gistered system 
subscribers. Vo ice web site 1U2 stores, serves and manages 
voice web pages 103 and can execute associated external 
scripts or programs in accordance with the present inven- 
tion. These external scripts and programs interface with 

10 databases and other information sources both internal and 
external to web site 102. 

Voice web gateway 105 is a computer connected to the 
internet 101. Voice web gateway 105 also includes a con- 
ventional voice telecommunications interface 114 for cou- 

15 pling to the public switched telephone network (PSTN) 109 
for telephonic communications with a subscriber 107. Tele- 
phone 111 is any voice enabling telecommunications device. 
Exemplary telephones include conventional desktop 
telephones, portable telephones, cellular telephones, analog 

20 telephones, digital telephones, smart phones and a computer 
configured to operate as a telephone and perform telephonic 
functions. Thus voice web pages 103 are universally acces- 
sible from any ordinary telephone 111. Alternatively, a 
subscriber 107 may access voice web pages 103 either by 

25 using a subscriber interface local to voice web gateway 105 
(i.e. a direct user interface with voice web gateway 105) or 
by dialing into voice web gateway 105 using another com- 
puter such as a personal digital assistant or a smart phone. 
Voice telecommunications interface 114 serves as an 

30 interface between a voice web browser 106 and telephone 
111 and preferably includes conventional telephony and 
voice processing hardware and software enabling voice web 
gateway 105 to receive and answer telephone calls, respond 
to touch tone and voice commands, route and conference 

35 calls, play voice prompts and record voice messages. 

Voice web gateway 105 additionally hosts a voice web 
browser 106. Voice web browser 106 is a computer program 
capable of accessing and processing voice web pages 103 in 

^ response to a request placed by subscriber 107. More 
specifically, voice web browser 106 (i) processes voice and 
touch tone activated subscriber commands, (ii) retrieves 
requested voice web pages 103 from the appropriate voice 
web site 102, (iii) interprets the embedded markup language 

45 (HVML) in the retrieved voice web page 103 and (iv) 
delivers the contents of a voice web page 103 to a subscriber 
107 over the telephone 111, In performing the above- 
mentioned processing, voice web browser 106 executes 
scripts, including "voice scripts" embedded in a voice web 

50 page 103. Voice web browser 106 provides a subscriber 107 
with fast, easy, convenient voice activated navigation and 
access to voice web pages 103. 

Voice web browser 106 is a conventional web browser 
modified with appropriate voice information playback and 

55 recording extensions and enhancements. Appendix A 
includes a specification of HVML and voice web browser 
commands and is incorporated herein by reference. 

Some voice web pages 103 contain references to scripts 
and programs that operate as service agents 110) to respond 

60 to subscriber requests as well as external events and carry 
out prescribed actions. These scripts and programs are 
externally stored on voice web sites 102 (for example as 
Common Gateway Interface (CGI) Scripts or Internet Ser- 
vices Application Programming Interface (ISAPI) 

65 programs). These external scripts and programs execute in 
the voice web server 102 environment as a service agent 
110. The external scripts and programs that comprise service 
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agents 110 are referred lo by URLs embedded in an asso- 
ciated voice web page 103. In the case of a voice web page 
103 that is a voice form, the script or program associated 
with the service agent executes in response to voice form 
submission by a subscriber 107. Service agents 110 follow 
standard Internet protocols such as HTTP, and conform lo 
conventional formats such as MIME and application pro- 
gramming interfaces (APIs) such as CGI and IS API. 

HVML Description 

Conventional web pages are designed primarily for pre- 
sentation on a computer color monitor and navigation by a 
mouse and key board. As such, graphics, images and text are 
the primary media types supported widely. Although, audio, 
video and 3-dimensional graphics extensions are becoming 
available, these extensions are directed primarily at com- 
puter users and not telephone users. 

Voice web pages 103 consist of HTML pages that have 
been extended with Hyper Voice Markup Language 
(HVML) for easy and effective navigation and access of 
voice information via a voice activated device such as an 
ordinary telephone. Voice web pages 103 retain all the 
properties and behavior of conventional HTML pages such 
as HTML markup tags, universal identifiers (URLs), and 
hyper-links and can be accessed by a conventional web 
browser using HTTP protocols from a conventional web 
server. The additional markup tags are interpreted by an 
HVML extended web browser to enable subscribers 107 to 
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web pages 103 are each uniquely identified by their corre- 
sponding URL. Once located, a web page 103 can be 
created, edited and played using existing web publication 
tools, it can be stored on any conventional web server 
anywhere on the Internet, it can be accessed by any con- 
ventional web browser and presented on a computer 
monitor, it can be navigated using the computer's mouse, 
keyword, and (with some additional plug-ins) microphone, 
and it can contain embedded anchors and hyper links to 
other HTML pages, including other HVML pages. 

Voice web pages 103 are designed for three primary 
purposes: (i) presenting structured voice information to a 
user, (ii) enabling the user to navigate across and within 
voice pages; and (iii) capturing user input for information 
queries or submission. 

a. HVML Presentation. Presentation of voice information 
is accomplished primarily by the voice tag. The voice tag has 
a type attribute which specifies the type of voice information 
to be presented. If the type attribute has the file value, the 
voice information is obtained from a voice file specified by 
its URL. If the type attribute has the text value, the voice 
information is synthesized from the specified text. If the type 
attribute has number, ordinal, currency, date, or character 
value, then the voice information is generated by concat- 
enating voice fragments from a pre-recorded indexed system 
voice file. If the type attribute has the stream value, then the 
voice information is obtained from the voice stream speci- 
fied by its URL. Composition of several voice elements into 



navigate and access voice web pages 103 over the phone or 30 a seamless voice string is accomplished by the voice-string 



similar voice activated device. Appendix A includes a speci- 
fication of HVML and voice web browser commands and is 
incorporated herein by reference. 

HVML pages web pages voice web page 103 are specially 
designed for presentation using an ordinary telephone 111 
and navigation using touch tones and voice commands. This 
is in contrast to conventional multimedia web pages that 
may embed audio data to be presented on a multimedia 
personal computer using its speakers and navigated using its 
mouse, key board and microphone. Although, HVML voice 
web pages 103 can be embedded in generic multimedia web 
pages, thus sharing some of the information, they are 
designed to be presented using an ordinary phone and 
navigated usiog commands generated by touch tone signals 
and speech recognition. 

An HVML web page (voice web page 103) is first and 
foremost an HTML page. Each web page 103 has a unique 
universal resource locator (URL) (also called uniform 
resource locator). A URL is a string of characters that 
uniquely identifies an internet resource including an identi- 
fication of (i) the access protocol to be used; (ii) an indica- 
tion of resource type; and an identification of its location in 
the computer network. For example, the following fictitious 
URL identifies a www document: http://www.voiscorp.com/ 
banner.gif uniquely identifies the location of a resource on 
the world wide web computer network, "hup://" indicates 
the access protocol, "www.voiscorp.com" is the domain 
name of the computer on which the resource is located, 
"banner" is the name of the resource located on the computer 
specified by the domain name, "gif" indicates that the banner 
resource is a gif (graphical interchange file) type resource. 
Similarly, the following fictitious URL uniquely identifies 
the location of a voice web page 103: http:// 
www.voiscorp.com/voicememo.hvml. In this example, 
"voicememo" is the name of the resource located on the 
computer specified by the domain name, "hvml** indicates 
that the voicememo resource is an hvml type resource. Thus, 
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tag. 

Combining these tags, publishers can compose and 
present: (i) pre-recorded voice prompts and messages; (ii) 
voice prompts generated using text-to-speech technology; 
and (ui) Pre-formatted voice prompts with dynamic speech 
synthesis elements. 

b. HVML Navigation. Navigation of voice web pages 103 
is primarily accomplished by extending the HTML anchor 
tag with new attributes — tone and label. These attributes are 
used in conjunction with the existing href attribute in an 
anchor element that makes the anchor into a hyper link. 
When the user selects the touch lone signals specified by the 
value of the tone attribute or utters the word specified by the 
label attribute, the browser invokes the corresponding hyper 
link. The tone and label attribute values must be unique 
within a page. Navigation is also accomplished by system 
commands such as next, previous, reload, home, bookmarks, 
help, fax, and history which are invoked by specific touch 
tone sequences or utterance of the words. Users can control 
the voice browser operations by issuing system commands 
such as stop, start, play, pause, exit, backup, and forward. 
Using these attributes, publishers can enable (i) touch tone 
command and control and link navigation; (ii) pre-defined, 
system and user specific, spoken command and control key 
word recognition; and (ii) page and user specific spoken 
command and control key word recognition. 

c. HVML Forms. HVML uses the form tag to enable user 
input similar to HTML including the method attribute which 
specifics the way parameters are passed to the server and the 
action attribute which specifies the procedure to be invoked 
by the server to process the form. HVML extends the input 
tag within forms by introducing voice-input tag. Voice-input 
takes a type attribute similar to the input tag with three new 
values "voice", "tone" and "review" in addition to the 
existing "reset" and "submit" values. The HVML browser 
pauses at each voice-input statement in a HVML form until 
the specified input is supplied or input is terminated, before 
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processing the remaining form. Using these tags and 
attributes, publishers can enable: (i) touch tone command 
and control and parameter input; (ii) pre-defined, user 
specific, spoken alphabet and digit input; (iii) page and user 
specific, spoken key word and proper names input; and (iv) 
free form voice information input. 

Operational Description of the Voice Web Browser 

Syntactic and structural intelligence, such as in-line pre- 
recorded voice prompts, pre-formatted voice prompts with 
dynamically generated voice elements, key word accessible 
anchor elements, voice responsive hyper links etc. are 
embedded in voice web pages 103 through voice access 
extensions to HTML. Behavioral intelligence including 



free) service phone number and by then supplying their 
account number via the telephone 111. In an alternative 
embodiment, the services are publicly available and any user 
placing a call into the system is processed as a subscriber 
5 107 without requiring any registration. 

FIG. 2A is a functional block diagram of a voice web 
system 200 configured to provide voice web services to a 
subscriber 107. Voice web system 200 includes one or more 
voice web gateways 105 coupled to one or more service sites 
10 202 via internet 101. Service site 200 is a voice web site 102 
configured to provide voice web services. Each voice web 
service is implemented using a collection of service agents 
201 and service pages 203 centered around a service data- 
base 202. Additionally, service site 200 optionally includes 
command interpretation, page access, file caching, H VML 15 a personal profile 204 to be used to the extent that the service 



interpretation and user interaction is embedded voice web 
browser 106 (the I I VML browser). Voice web browser 106 
has the following states: (i) waiting for user commands; (ii) 
active accessing and playing HVML pages; and (iii) paused 
for user input. 

Initially, voice web browser 106 is launched upon the 
system's receipt of a subscriber's telephone call. Once 
launched, voice web browser 106 goes through an initial- 
ization sequence that includes subscriber authentication and 
normally becomes "active" accessing and playing the sub- 
scriber's home page. Once the home page is played, voice 
web browser 106 "waits" for subscriber commands. As part 
of playing the page, the browser may "pause" for subscriber 
input and continue once the input is provided. 

Independent of any specific voice web page 103 that a 
subscriber may be accessing, voice web browser 106 pro- 
vides a set of navigational and operational commands. 
Within the telephone key pad, "*" and "#" are special keys 
that generate unique tones. Voice web browser 106 has 
special meaning for these keys. In general, the key 
followed by a sequence of touch tones, excluding the 
key, signals a browser command, an escape or a skip and the 
"#" key signals a link activation, termination of form input, 
termination of a key sequence or a selection. 

Voice Web Services 

Voice web system 100 can be used to provide voice web 
services to a subscriber 107. A voice web service is a service 
that provides on-line telephone based access to information. 
The information is presented to the user through the publi- 
cation of voice web pages 103. The information presented to 
(published for) the subscriber may be information retrieved 
from a single information source or a combination of 
information sources including publicly accessible on-line 
databases, information proprietary to voice web system 100, 
information previously stored by subscriber 107 or another 
information source. Exemplary services provided by voice 
web system 100 include (i) personal information services 
such as calendar, address book, electronic mail, voice mail, 
(ii) information services such as headline news, weather 
reports, sports score, stock portfolio quotes, business white 
pages, yellow pages, classified information and (iii) trans- 
action services (commerce services) such as banking, bill 
payments, stock trading, airline hotel and restaurant reser- 
vations and catalog store orders. 

Users gain access to voice web services by becoming 
voice web subscribers 107. Subscribers 107 preferably sign 
up (e.g. register) for services through a service provider. In 
one embodiment, each subscriber 107 is assigned a unique 
account number on a calling card and subscribers 107 access 
the voice web system 100 by dialing a single "800" (e.g. toll 
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being provided requires pre -stored subscriber-specific infor- 
mation (i.e. pre-stored information personal to the particular 
subscriber). 

Voice web service agents 201 are a type of service agent 
110 (shown in FIG. 1) that execute on service site 102 to 
provide voice web services to a subscriber 107. Vfeice web 
service agents 201 are therefore scripts and programs rep- 
resented by a web page 103 (show in FIG. 1). 

Service database 202 is a database of service information. 
The content of the service information varies with the type 
of service being provided. For example, if voice web system 
100 is configured to deliver a business white page service, 
then service database 202 is a database of address and phone 
number listings for businesses. If voice web system 100 is 
additionally or alternatively configured to deliver news 
headlines, then voice web system 100 includes a service 
database 202 that includes current news headlines. 

Service forms and pages 203 are voice web pages 103 that 
are HVML templates (voice forms and pages) that are "filled 
in" in response to a specific subscriber request. Service 
pages and forms 203 are used to gather subscriber input, to 
retrieve information and to deliver (publish) information to 
a subscriber. Some service pages 203 arc database entry and 
administration forms, some are database query forms and 
others are database response pages. Entry forms are used to 
add information to the database. Query forms arc used to 
extract information from the database. Response pages are 
used to present retrieved information to the user. In the 
preferred embodiment, service agents dynamically generate 
service and pages forms 203 by retrieving requested data 
from service database 202 and using the retrieved data in 
place of corresponding variables stored in an HVML tem- 
plate. The HVML templates link to each other specifying 
request-response dependencies. Thus, subscribers 107 are 
able to enter and retrieve information in personal and 
external databases over internet 101 using web protocols 
without having to create a voice web page for each entry in 
service database 202. 

Service agent 201 typically uses a service database 202 
and a set of service pages and forms 203 to provide the 
corresponding voice web service. The service database 202 
hosts the information that subscribers 107 wish to access. 
The service forms allow subscribers 107 to input and query 
information in service database 202. Service pages allow 
service agents 201 to present the requested information to 
the subscriber 107 using voice web browser 106. 

FIG. 2B is a functional block diagram of an exemplary 
calendar service. The calendar service agent 210 uses the 
calendar database 211 together with the calendar and 
appointment details input and query voice web forms 212 
and appointment list and details voice web pages 213. 
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Subscribers fill in the calendar and appointment details input 
voice web forms 212 to set their calendar appointments and 
their details. The calendar service agent 210 processes the 
submitted form and updates the calendar service database 
211. Later, subscribers can retrieve their appointments for 
any day by supplying 214 the month, date and year for that 
day in the calendar query voice web form 212. The calendar 
service agent 210 processes the submitted form, retrieves the 
matching appointments from the calendar database, and 
dynamically composes and returns the appointment list 
voice web page 213. If the subscriber requests for the details 
of any appointment, the calendar service agent 210 dynami- 
cally generates and supplies the corresponding appointment 
details page 213. 

The Personal Voice Web 

FIG. 3 shows a personal voice web 300 in accordance 
with the present invention. Personal voice web 300 is 
standardized collection of linked voice web pages and voice 
web forms (a special type of voice web page) that form a 
personal service space for the subscriber. Preferably, all 
subscribers share a common structure of linked voice web 
pages although the contents of personal voice web pages 
vary from subscriber to subscribe. Because each subscriber 
of the personal voice web system 300 has the linked page 
structure shown in FIG. 3, subscribers navigate about and 
access information from their personal voice web 300 in a 
standardized way. Each page in personal voice web 300 
includes ao agent that performs various processing tasks 
required for each respective page. At the root of personal 
voice web 300 is the personal home page 301. Personal 
home page 301 links to a personal profile page 302, a 
personal administrative assistant page 303, a personal help- 
desk page 304, and a personal commerce page 305. 

The personal administrative assistant page 303 is linked to 
a number of personalized voice web services (service pages) 
330 including, by way of an example, a calendar and 
appointments page 309, an address book page 310, a stock 
portfolio page 311, a news headlines page 312, a mail box 
page 313, and a business white pages home page 314. 

Calendar and appointments page 309 is used to provide an 
appointments service. The appointments service enables a 
subscriber to track personal and business appointments in a 
voice-based calendar. The subscriber thus adds and retrieves 
appointments over the phone using personal voice web 300. 
In addition to providing day and time information related to 
stored appointments, a subscriber may also store voice note 
annotations that is associated with a particular appointment. 

Address book page 310 is used to provide an address 
service. The address service enables a subscriber to add and 
retrieve address, phone number, and other information 
related to individual names or company names. The infor- 
mation added and retrieved is stored in a address book 
service database private to the subscriber. 

Stock portfolio page 311 is used to provide a stock quote 
service. The stock service enables a subscriber to retrieve 
current stock pricing and portfolio valuation information as 
well as statistical information related to changes in portfolio 
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Mail box page 313 is used to provide a mailbox service. 
The mailbox service enables a subscriber to access elec- 
tronic mail (e-mail) messages. The e-mail messages are 
played for the subscriber using text to speech conversion and 
a speech synthesizer. 

Business white pages home page 314 is used to provide a 
white page service. The white page service enables a sub- 
scriber to enter partial company name, and optionally city 
name and state code to retrieve the company's full name, 
address and phone number. 

Each service page 309-314 is part of a collection of voice 
forms and pages that are used by the corresponding service 
agent to retrieve a request from the subscriber, generate an 
appropriate database query responsive to the subscriber- 
request, retrieve subscriber-requested information, and gen- 
erate a voice web page that incorporates the retrieved 
information and that is adapted for presentation 
(publication) to the subscriber using a voice web browser. 
Thus, for example the service agent associated with calendar 
and appointments page 309 generates a voice form for 
prompting a subscriber for month, day and year information. 
After receiving the prompted information, calendar and 
appointments service agent generates the appropriate query 
to extract the requested calendar information from a calen- 
dar service database. Once the calendar information is 
retrieved from the database, the calendar and appointments 
service agent generates a voice web page that includes the 
retrieved information. The new page is then presented 
(published) to the subscriber over the telephone by the voice 
30 web browser. 

Each of the other personal service agents associated with 
personal service pages 308-327 operate in a similar way to 
provide a subscriber with information retrieved from asso- 
ciated service databases. 

Personal helpdesk page 304 is linked to personal voice 
web helpdesk service pages 331 including, by way of 
example, a hotels page 315, an airlines page 316, a rental 
cars page 317, a travel agents page 318, a restaurants page 
319, a financial services page 320, and a banks page 321. 
The personal helpdesk page has an associated personal 
helpdesk agent that is used to provide a set of helpdesk 
services. Helpdesk services enable a subscriber to access 
product, pricing, availability and other information of the 
corresponding services. 

Hotels page 315 is used to provide a hotel reservation 
service. Airlines page 316 is used to provide an airline 
booking service. Rental cars page 317 is used to provide a 
rental car reservation service. Travel agents page 318 is used 
to provide a travel service. Restaurants page 319 is used to 
provide a menu and reservations service. Financial services 
page 320 is used to provide a financial service. Bank page 
321 is used to provide a bank service. 

Personal commerce page 305 is linked to personal voice 
web commerce service pages 332 including, by way of 
example, an apparel shops page 322, a luggage stores page 
323, a gift shops page 324, a flower shops page 325, an office 
supplies stores page 326, and a book stores page 327. The 
personal commerce page provides commerce services that 
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retrieved from a stock portfolio service database private to 
the subscriber and additionally retrieves current slock pric- 
ing information from an on-line data-base or information 
source. 

News headlines page 312 is used to provide a news 65 
service. The news service enables a subscriber lo retrieve 
news headlines related to subscriber customized topics. 



various retail establishments. As part of the commerce 
service, the personal voice web allows a subscriber lo shop 
in various catalogs and Ihen submit orders for selected items 
directly to the sponsor of the associated catalog. Orders are 
submitted to the catalog sponsor either as a voice web form 
or conventional web form seni lo ihe sponsor, as an elec- 
tronic message or using another means. 
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Personal profile page 302 links to a set of personalized 
voice web profile pages including an authentication page 
306, a speech profile page 307, and an attributes and 
preferences page 308. 

User authentication page 306 contains authenticating 
information including a subscriber account number, an 
encrypted password or personal identification number and 
links to a voice authentication signature MIME resource. 
* Speech profile page 307 is linked to a hierarchy of speech 
training pages that correspond to the hierarchy of personal 
voice web 300. FIG. 4 shows the hierarchy 400 of speech 
training pages 401-427. Speech training pages 401-427 are 
sets of pre-captured training files to be used in performing 
speaker dependent speech recognition in providing the cor- 
responding service to a subscriber. Each speech training 
page is thus accessed by the corresponding agent in per- 
forming the corresponding service. For example, the admin- 
istrative assistant service accesses administrative speech 
training set 431 (including speech training pages 409-414). 
The helpdesk service accesses the helpdesk training page set 
432 (including speech training pages 415-421). The com- 
merce service accesses the commerce training page set 433 
(including speech training pages 422-427). 

Each speech training page 401-427 includes training data 
specifically tailored to the words more commonly associated 
with the corresponding service. For example, the calendar 
speech training page 409 includes training vocabulary to aid 
in the recognition of voice commands such as "Tenth", 
"November", "Tuesday" and so forth. 

Referring now again to FIG. 3, personal attributes and 
preferences page 308 includes subscriber attribute informa- 
tion including name, account number, address, voice tele- 
phone number, fax telephone number, paging telephone 
number, encrypted credit card numbers and the like as well 
as personal preference information such as configuration, 
selection and presentation preferences. Personal attributes 
and preferences page 308 is also linked to hierarchy of 
attribute and preferences pages (shown in FIG. 5) that 
correspond to the hierarchy of personal voice web 300. 

FIG. 5 shows the hierarchy of attributes and preferences 
pages 501-527 associated with personal attributes and pref- 
erences page 308. Attributes and preferences pages 501-527 
are pages that store subscriber-specific preference informa- 
tion to be used in providing the corresponding service to a 
subscriber. Each attributes and preferences pages 501-527 is 
thus accessed by the corresponding agent in performing the 
corresponding service. For example, the administrative 
assistant service accesses attributes and preferences set 531 
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training profile database (a service database). Calendar ser- 
vice profile agent responds to HTTP form requests for 
calendar attributes and preferences or calendar speech train- 
ing profile page information for any particular subscriber 
and supplies the appropriate subscriber profile page infor- 
mation as HVML voice web pages. 

The collection of profile pages for a single user constitute 
that user's personal voice web profile 300. Personal Voice 
web profile 300 need not be a collection of static HVML 
pages (voice web pages), but instead be generated dynami- 
cally using user profile page databases. However, once 
generated, these profile pages can be reused from various 
cache systems within the voice web system without having 
to retrieve them from their original databases thus saving 
15 significant time and resources. 

In operation, a personal voice web service agent uses a 
corresponding service profile agent to retrieve subscriber 
and service specific attributes and preferences, speech train- 
ing profiles and other information from the corresponding 
service profile database. The personal voice web service 
agent uses the retrieved subscriber and service specific 
information in personalizing the voice web service forms 
and pages as well as in enhancing and improving speech 
recognition by embedding the speech training profiles in the 
corresponding voice web forms and pages. 

Referring back to FIG. 2B, for example, the calendar 
service agent 210 uses a corresponding calendar service 
profile agent 215 to retrieve subscriber specific calendar 
attributes and preferences included in profile database 216 
by specifying the subscriber's calendar attributes and pref- 
erences profile URL as part of a profile request web form. 
Calendar service profile agent 215 responds to the submitted 
web form, retrieves the requested subscriber information 
from the calendar service profile database 216 and delivers 
it to calendar service agent 210 as a table formatted web 
page. Calendar service agent 210 retrieves the requested 
information from the table format in the web page and uses 
the subscriber's attributes and preferences to customize the 
voice web service form and page templates 213 before 
presenting them to the subscriber. In this way, the subscriber 
can have a personalized form or page presented to him/her 
without having to supply information about himself/herself 
repeatedly in each call. 

Similarly, calendar service agent 210 uses a correspond- 
ing calendar service profile agent 215 to retrieve subscriber 
specific calendar speech training profiles from profile data- 
base 216 by specifying the subscriber's calendar speech 
training profile URL as part of a profile request web form. 
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helpdesk service accesses the helpdesk attributes and pref- 
erences set 532 (including attributes and preferences pages 
514-521). The commerce service accesses the commerce 
training page set 543 (including attributes and preferences 
pages 522-527). 

It should be noted that the user profile information for 
multiple subscribers is stored in user profile databases. The 
user profile databases are accessed by service dependent 
profile agents. For example, personal identification and 
verification information of multiple subscribers is stored in 
a user profile home page database (a service database) and 
accessed by the subscriber's profile home page agent. Cal- 
endar attributes and preferences information for multiple 
subscribers is stored in the subscriber calendar attributes and 
preferences profile database (a service database). Calendar 
service specific speech training information for multiple 
subscribers is stored in the subscriber calendar speech 
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web form retrieves the requested subscriber information 
from the calendar service profile database 216 and delivers 
it to the calendar service agent 210 as a table formatted web 
page. The calendar service agent 210 retrieves the requested 
information from the table format in the web page and 
embeds the subscriber's speech training profiles in the voice 
web form and page templates (pages 212,213) before deliv- 
ering them to the voice web browser. The voice web browser 
uses these speech training profiles to dynamically change the 
active vocabulary in the voice processing software and 
hardware thereby customizing it to the subscriber. 

FIG. 2C is a functional block diagram of an alternative 
configuration of a voice web system in accordance with the 
present invention. The system includes a computer config- 
ures as a combined voice gateway and voice web site 
(combined site) 220. Combined site 220 includes gateway 
components such as a voice and telephony interface 114, a 



04/13/2004, EAST Version: 1.4.1 



US 6,400,806 Bl 



15 



16 



voice web browser 106 and server software 112. Combined 
site 220 additionally includes voice web site components 
such as service agents 201, service database 202 and sen.' ice 
forms and pages 203. Combined web site 220 provides voice 
web access to a subscriber 107 coupling the combined site 
220 via the PSTN 109. Because the voice gateway and voice 
web site functions are combined within a single computer 
environment, the server software 112 (located in combined 
site 220) and the voice web browser 106 exchange files 
without suffering the delays imposed by routing across the 
Internet 101. In certain applications, for example when a 
subscriber is accessing personal databases this configuration 
is advantageous to improve system performance. It should 
be noted, however, that even though server software 112 
(located on combined site 220) and voice web browser 106 
exchange files using a local interface as opposed to Internet 
101, they nonetheless exchange files in accordance with 
HTTP. 

Voice web browser 106 communicates with other web 
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login information including subscriber account number and 
the subscriber PIN. After a subscriber enters the login 
information (into the login form) and submits the login form, 
the login agent uses the login information to retrieve the 
URL of the subscriber's personal voice web home page 301. 
The login agent retrieves the URL by looking up the 
subscriber's account number in the voice web subscriber 
directory. The login agent additionally verifies the PIN 
which was submitted. Upon verification of the PIN, the login 
agent presents 603 the subscriber's voice authentication 
form to the subscriber over the telephone. As part of the 
presentation, the login agent requests the subscriber to 
supply a personalized voice authentication sample. The 
login agent then waits 604 for the subscriber to supply the 
sample and submit 605 the form. After the subscriber 
submits 604 the form, the login agent processes 606 the 
submitted form. During processing 606 of the submitted 
form, the login agent accesses the subscriber's personal 
authentication page from the subscriber's personal voice 



sites (such as web sites 224 and 225) using Internet 101. » web profile (linked to the subscriber's home page) and 



25 



30 



Web site 224 is a computer coupled to Internet 101 config- 
ured with server software 112, service agents 201, service 
database 202 and service forms and pages 203. Web site 224 
is configured to deliver voice web services as described in 
reference to FIGS. 2A and 2B. 

Web site 225 is a computer configured with server soft- 
ware 112, a profile service agent 223, service forms and 
pages 222 and profile database 221. Web site 225 is a 
universally accessible profile web site that is accessed by 
any other web site or web gateway in the voice web system 
as long as the accessing web site or web gateway has the 
appropriate URL information. Web site 225 provides user 
profile information to web site agents (such as service agents 
201) located on other web sites (such as web site 224 and 
combined site 220). Advantageously, any web site and/or 35 
web gateway can thus access information stored in the 
profiles database 216 by hyperlinking to the web page 
associated with profile service agent 215. 

User Authentication and Verificati on *o 

Pe rsonal voice web system 300 uses a login agent as a 
a atek'eeper to the access ot each subscriber's personaLY flice 
web. The login agent is a distributed software program that 
can receive subscriber information over a telephone, access 
the subscriber's personal profile pages from the subscriber's 
personal voice web and verify the subscriber's credentials 
over the telephone. 

Each system subscriber is given (i) an account number (ii) 
a personal identification number (PIN) and (in) a service 
calling number. In order to access a personal voice web, the 
subscriber calls the service calling number and uses account 
information and the PIN to initiate a subscriber authentica- 
tion process. FIG. 6 is a flow diagram of a subscriber 
authentication method 600 in accordance with the present 
invention. The subscriber authentication method 600 
includes authentication signature creation form processing 
and subscriber authentication processing. 

A subscriber initiates access 601 of his or her personal 
voice web 300 by calling the service calling number using 
a conventional telephone or a similar voice activated device 
computer configured to access the public telephone network. 
After the subscriber initiates access 601, a login agent starts 
login processing 602. 

During login processing 602, the login agent answers the 
call and presents a standard login form to the subscriber. A 
login form is a voice form for collecting and submitting 
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attempts to retrieve the voice authentication signature. If this 
is the first time the subscriber is accessing the service, the 
signature will be missing from the subscriber's authentica- 
tion page. In this case, the login agent presents 607 the 
authentication signature creation form to the subscriber. 

Using the options presented in the signature creation 
form, the subscriber selects the option to create or modify 
the personal voice authentication signature. 

Following the instructions provided by the login agent, 
the subscriber fills in 608 the voice authentication signature 
creation form and records a personalized voice phrase as an 
authentication signature. After filling in 608 the signature 
creation form, the subscriber submits the form to the login 
agent. The login agent waits until the signature creation form 
is submitted 609. The login agent then processes 610 the 
recorded phrase converting it into a signature pattern and 
linking it to the user authentication page as a MIME resource 
for future verification. 

If however, after processing 606, the login agent deter- 
mines that there is an authentication signature stored in the 
subscriber's personal profile then the login agent perform a 
test 611 to determine whether there is a match between the 
stored authentication signature and the voice sample sub- 
mitted by the subscriber. If test 611 determines that there is 
a match between the sample and the signature, then the 
subscriber is given access to the personal voice web and the 
voice web. Test 611 uses conventional voice authentication 
methods. A "match" is determined by test 611 when the 
conventional voice authentication method determines that 
the speaker's voice print or voice signature matches a master 
stored voice print or voice signature within a specified 
tolerance. If, however, the test determines that there is not a 
match between the sample and the signature, then the 
subscriber is denied access 613. 

Enhanced Speech Recognitio n 

Automatic speech recognition falls into three categories: 
speaker dependent, speaker adaptive, and speaker indepen- 
dent. A speaker dependent system is developed to work for 
a single speaker and are usually easier to develop, cheaper 
to buy and more accurate but requires the use of user- 
specific speech training files. 

The size of the vocabulary of a speech recognition system 
affects the complexity, processing requirements and the 
accuracy of the system. Referring now again to FIG. 3, 
personal voice web 300 uses small to medium sized vocabu- 
laries (ten to hundred of words). 
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An isolated-word or discrete speech system operates on 
single words at a time requiring a pause between each word 
utterance. This conventional type of speech recognition is a 
simple form of recognition to perform because the end 
points arc easier to find and the pronunciation of a word 
tends not to affect others. As the occurrences of the words 
are more consistent and sharply delimited they are easier to 
recognize. Personal voice web 300 focuses on discrete 
speech and in particular on speech used for command and 
control. 

Personal voice web 300 typically uses speech coded at 8 
kHz using 8 bit samples resulting in 64 kbps bandwidth and 
storage. Conventional adaptive pulse code modulation 
(ADPCM) techniques can reduce the bandwidth to 16 kbps 
without loss of information. 

Personal voice web 300 uses conventional speaker depen- 
dent recognition of discrete speech. This conventional 
speaker dependent recognition relies on digital sampling of 
the word utterances. After sampling, the next stage is 
acoustic signal processing. Most techniques include spectral 
analysis. This is followed by recognition of phonemes, 
groups of phonemes and words. This stage uses many 
conventional processes such as Dynamic Time Warping, 
Hidden Markov M odeling, Neural Networks, expert systems 
"and combination ol techniques. Hidde n Markov Modelin g 
based techniques are commonly used and generally the most 
s uccessful approa ch. Additionally, personal voice web 300 
uses some knowledge of the language to aid the recognition 
process. 

Personal voice web 300 improves speaker dependent 
recognition of discrete speech in a command and control 
context using universally accessible personal speech train- 
ing profiles 401-427. As described above, the personal 
speech training pages 401-427 are organized as a linked 
collection of voice web profile pages each linked to the 
corresponding personal voice web service page. Thus, the 
personal speech training profile pages parallel the personal 
voice web service pages in structure as shown in FIGS. 3 and 
5. Each speech training page 401-427 contains the training 
vocabulary for browser command and control that is context 
dependent. 

- Each service page 301-327 linked to the personal voice 
web home page 401 has a corresponding speech training 
page 402-427. The personal voice web 300 is constructed in 
such a way that each voice web service page 302-327 links 
to its corresponding speech training page 401-427 using its 
URL. As the subscriber navigates from service page to 
service page in the personal voice web 300, the system is 
able to access the corresponding speech training page using 
its embedded URL. 

Each speech training page 401-427 contains a set of 
command and control key words and their personalized 
speech recognition patterns representing the context sensi- 
tive vocabulary for the corresponding service page. For 
example, the calendar and appointments service page 309 is 
linked to a corresponding speech training page 409 contain- 
ing key words and recognition patterns for "year", "month", 
"day", the names of the months and days, digits representing 
dates and times etc. Similarly, stock portfolio page 311 is 
linked to a corresponding speech training page 411 contain- 
ing key words and recognition patterns for "stock", "quote", 
"volume", "option", "symbol", names of companies in the 
portfolio etc. 

FIG. 7 is a flow diagram of a speech recognition process 
700 in accordance with the present invention. The process is 
initiated after a subscriber has gained access 701 to the 
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personal voice web in accordance with the process described 
in reference to FIG. 6. Once the subscriber gains access to 
the personal voice web 701, the login agent accesses the 
subscriber's personal voice web home page and presents 702 

5 the home page to the subscriber over the phone. During the 
process of presenting 702 the home page, the login agent 
loads the personal voice web profile page 302 and the speech 
profile page 501 containing the command and control 
vocabulary for the home page. This vocabulary includes the 

10 basic voice web browser command and control as well as 
home page specific command and control. From the home 
page, the subscriber requests a particular service (i.e. per- 
sonal administrative assistant, the personal helpdesk or the 
personal catalog store). The home page agent determines 

is 703 what service the subscriber has selected and in response, 
invokes 704 the selected service and then proceeds to deliver 
705 the service. During invocation 704 of the service, both 
the service page and the speech training page associated 
with the service page arc loaded on the voice web gateway 

20 where the voice web browser uses them to deliver the 
service and improve speech recognition. 

During delivery 705 of the selected service, the service 
agent uses the speech training page associated with the 
selected service to recognize voice commands submitted 

25 720 by the subscriber. Specifically, the service agent obtains 
the speech training profile, embeds it in the service page as 
a MIME resource and forwards it to the voice web browser 
which uses the training profiles to improve recognition. 
Thus, responding to the subscriber's voice commands per- 

30 linent to the accessed voice web service page, the voice web 
browser recognizes the command and control word utter- 
ances (the subscriber's voice commands that are submitted 
720) and matches them against the personalized vocabulary 
in the corresponding voice web speech training page for 

35 accurate speaker dependent recognition of discrete speech. 
If the subscriber requests access to a new service page 
linked to a currently accessible service page, the currently 
active service agent exits 706 the current service and then 
invokes 704 the requested service. During the invocation of 

40 the requested service, the requested voice web service page 
corresponding to the requested service is loaded as well as 
the corresponding speech training page containing the 
matching command and control vocabulary. In this process 
700, the active service agent always uses the most appro- 

45 priate vocabulary for the existing context thereby greatly 
reducing the size of the active vocabulary that needs be 
accessed while significantly improving the speaker depen- 
dent recognition. 

50 Query I.ocalization and Customization 

Query customization uses stored subscriber attributes and 
preferences to customize queries of service databases. Query 
customization is accomplished by maintaining user 

55 attributes and preferences in a collection of voice web pages 
501-527 (described above in reference to FIG. 5) that 
parallel the corresponding voice web service pages 301-327 
(described above in reference to FIG. 6) and using the 
attribute and preferences information corresponding to the 

60 service requested to customize the query parameters within 
forms. 

Referring now again to FIG. 5, the attributes and prefer- 
ences pages 501-527 parallel the personal voice web service 
pages 301-327 in structure as shown in FIG. 3. Each service 
6$ page linked to the personal voice web home page 301 has a 
corresponding voice web attributes and preferences page 
linked to it. The personal voice web 300 is constructed in 
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such a way that each voice web service page 301-327 links 
to its corresponding voice web attributes and preferences 
page 501-527 using its URL. As the subscriber navigates 
from service page to service page in the personal voice web 
300, the system is able to access the corresponding voice 
web attributes and preferences page using its embedded 
URL. 

A subscriber of voice web services requests information 
by accessing a voice web service page and having it played 
by the corresponding agent (i.e. administrative assistant, 
helpdesk or commerce agent). The subscriber requests ser- 
vice through submitting a query form presented by the 
corresponding agent. The query form is an HVML form for 
touch tone and voice data input. When a service is requested 
by the subscriber, the agent retrieves the corresponding 
voice web attributes and preferences page and automatically 
fills the query form with appropriate default parameters 
obtained from the subscriber's attributes and preferences. 
For example if the subscriber is accessing the weather 
service page, the agent fills in the subscriber's home town 
and other chosen cities automatically from the subscriber's 
attributes and preferences page. Similarly, if the subscriber 
is accessing the slock portfolio service page, the agent 
accesses the corresponding attributes and preferences page 
and fills in the subscriber's chosen portfolio of stocks in the 
query form. In addition, the agent also automatically fills in 
the appropriate subscriber attributes such as his/her access 
account number, password etc., thereby easing the subscrib- 
er's access while exploiting the availability services through 
web based queries. 

FIG. 8 is a flow diagram of a query customization process 
800 in accordance with the present invention. The process is 
initiated after a subscriber has gained access 801 to the 
personal voice web in accordance with the process described 
in reference to FIG. 6. Once the subscriber gains access 801 
to the personal voice web, the login agent accesses the 
subscriber's personal voice web home page and presents 802 
the home page to the subscriber over the phone. 

During the process of presenting 802 the home page, the 
login agent loads the attributes and preferences page 501 
from the subscriber's voice web personal profile. Attributes 
and preferences page 501 contains preferences for the home 
page 301. From the home page 301, the subscriber accesses 
the targeted voice web service page by navigating the 
appropriate hyper links from the voice web home page 301. 
In response, the selected service is invoked 803 and the 
selected service then proceeds to deliver 804 the service. 
During invocation 803 of the selected service, both the 
service page and the attributes and preferences page asso- 
ciated with the service page are extracted by the service 
agent. 

During delivery 804 of the selected service, the service 
agent uses the attributes and preferences page associated 
with the selected service to customize queries of the asso- 
ciated service database. More specifically, using the 
attributes and preferences information, the service agent 
automatically fills in the needed fields in the corresponding 
query form with user specified defaults and preferences. 
Having filled the appropriate fields, the service agent plays 
the remaining query form to the subscriber thereby greatly 
reducing the information that the subscriber has to supply on 
the telephone. The service agent then obtains the remaining 
information, if any, from the subscriber and submits the 
query form to the service database. When the results are 
returned (i.e. the information is retrieved from the service 
database), the service agent plays the results to the sub- 
scriber over the telephone. 
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Form Based Vbice Web Page Publishing 

In another aspect of the invention, voice web system 100 
enables publishers to compose voice web forms and pages 
statically using ordinary word processing programs and link 
them to voice files created using ordinary audio capture and 
editing tools available on personal computers and worksta- 
tions. Alternatively, voice web agents can dynamically com- 
pose voice web pages and forms based on user requests and 
optionally profiles as well as accessed databases and ser- 
vices. Advantageously, dynamic form-based publication 
enables information and service providers to publish voice 
web pages using the conventional telephone without the 
need for any additional computer based voice web publish- 
ing tools. Dynamic form-based publication is achieved by 
combining voice web publishing forms, voice web publish- 
ing agents and voice web page publishing templates. 

FIG. 9 is a flow diagram of a voice publishing method in 
accordance with the present invention. The method presents 
901 a voice web form to a caller calling into a voice web 
system using a conventional telephone. Voice web publish- 
ing forms are specially designed voice web forms that when 
interpreted (i.e. when played back) using the voice browser 
prompt the caller (the voice information publishers) to input 
voice and touch tone based input using a telephone. The 
forms guide the caller step by step to supply the needed 
information, edit and modify the information and finally 
submit 903 the information for processing 902. 

Voice web publishing agents process 902 the filled voice 
30 web publishing forms extracting and separating voice infor- 
mation and touch tone input. Based on the touch tone inputs, 
the agents may present additional publishing forms to the 
caller (publisher). The voice information is stored 904 in 
voice files and linked to the corresponding voice web page 
publishing template by substituting variables within the 
page template with the generated files. The touch tone input 
is used whenever the caller (publisher) needs to input 
alphanumeric information that can be processed by the 
publishing agent. 

Voice Web White, Yellow and Order Pages 

Without limiting the general applicability of form based 
voice web page publishing, a specific application of the 
process of form-based publishing is next described. The 
exemplary form based publishing process relates to the 
publication of voice web business while pages, yellow pages 
and order entry pages. FIG. 10 shows a white-yellow-order 
page system 1000 in accordance with the present invention. 
Voice web business white pages 1001 are voice web pages 
that are dynamically composed by the voice web business 
white pages agent 1003 from a business white page database 
1002 information including the name, address, phone num- 
ber of businesses. The white pages agent 1003 presents a 
search form to a caller for specifying the name of the 
business and allows further narrowing of the search by city 
and state. Each business white page can be linked to a 
corresponding business yellow page 1004. Business yellow 
pages 1004 contain additional information about the busi- 
ness including a tag line, advertisement, directions, working 
hours, and promotions. In addition, each yellow page 1004 
can be linked to a corresponding business order entry form 
1005. Business order entry forms 1005 allow users to order 
products and services or transact business by specifying 
product or service codes, preferences, quantity, and credit 
card numbers for payment. 

A participating business can publish a voice web yellow 
page 1004 by simply filing a corresponding voice web 
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yellow page publishing form 1007. A yellow page publish- 
ing agent 1006 processes the yellow page publishing form 
1007 and dynamically generates a business yellow page 
1004 for that business from a standard yellow page template 
by replacing variables in the template with values supplied 5 
by the submitted yellow page publishing form. 

The yellow page publishing agent 1006 (a publishing 
agent) presents a yellow page voice web publishing form 
1007 to the participating business. Voice web publishing 
forms are specially designed voice web forms that when 10 
interpreted (i.e. when played back) using the voice browser 
prompt the caller (the voice information publishers) to input 
voice and touch lone based input using a telephone. Yellow 
page publishing form 1007 guides the caller step by step to 
supply the needed information, edit and modify the infor- 1S 
mation and finally submit the information for processing, as 
described in reference to FIG. 9. Specifically, yellow page 
publishing form 1007 prompts for voice information includ- 
ing name, tag line, advertisement, directions, working hours 
and promotions. Io addition, the yellow page publishing 20 
agent 1006 prompts for touch tone input including the 
account number, password, phone number, yellow page 
category code and credit card number. Yellow page publish- 
ing agent 1006 uses the account number to identify the 
business, the password to verify the business, the phone 25 
number to link it to the corresponding white page, the yellow 
page category code to classify the business within business 
yellow pages, and the credit card number to pay for the 
business yellow page. Once the business is identified and 
verified, yellow page publishing agent 1006 dynamically 30 
creates a business yellow page 1004 from a standard tem- 
plate for the appropriate category. Yellow page publishing 
agent 1006 uses the supplied business phone number to 
match with the appropriate database entry in the business 
white pages and updates it with the URL of the newly 35 
created yellow page to link it. 

A very similar process occurs for publishing order entry 
forms. A business order entry form publishing agent, order 
page publishing agent 1008 presents an appropriate order 
entry publishing form 1009 to a participating business. 40 
Order page publishing agent 1008 requests for appropriate 
customized prompts for specific fields in the business order 
entry form such as product or service code, customer 
preferences, quantity, credit card number etc. Order page 
publishing agent 1008 also requests for touch tone input for 45 
the account number, password, phone number, and credit 
card number. Order page publishing agent 1008 uses the 
account number and password for identification and 
verification, the phone number to link it to the corresponding 
yellow page 1004 and the credit card number for payment 50 
for the order entry form. Once the business is identified and 
verified, order page publishing agent 1008 dynamically 
generates an order entry form for that business by filling the 
supplied information into a standard order entry template for 
that business category. Order page publishing agent 1008 55 
uses the supplied business phone number to match with the 
appropriate database entry in the business white pages, 
updates it with the URL of the newly created order entry 
page, locales the corresponding yellow page using its URL 
in the database, and updates it to link to the newly created 60 
order entry page. 

The foregoing discussion discloses and describes merely 
exemplary embodiments of the present invention. As will be 
understood by those familiar with the art, the invention may 
be embodied in other specific forms without departing from 65 
the spirit or essential characteristics thereof. Accordingly, 
the disclosure of the present invention is intended to be 



illustrative, but not limiting, of the scope of the invention, 
which is set forth in the following claims. 

Appendix A 

I. HVML Specification 

Hyper Voice Markup Language consists of a set of 
extensions to existing HTML. Some of the extensions are 
new elements with new tags and attributes. Others arc 
extensions to existing elements in the form of new attributes. 
All attribute values are shown as % value type %. 
In-line Voice components 

The primary mechanism for introducing voice prompts 
into an HTML page is a new inline voice HVML element 
similar to the in-line image HTML element. The tag for this 
element is "VOICE" and it has many variations. Each 
variation is specified by value of the TYPE attribute. 
Depending on the type, each variation has additional 
attributes. 
Voice Files 

<VOICE TYPE-" File" SRC-"% URL %" TEXT-"% 
text %"> 

VOICE tag with TYPE set to "File" indicates a file 
containing pre-recorded voice information. It's attributes arc 
SRC and TEXT. SRC attribute specifies the URL for the 
voice file and TEXT attribute, which is optional, specifies 
the text that can be translated to speech as an alternative to 
the voice file. 
Voice Index Files 

< VOICE TYPE-"Index" SRC="% URL%" INDEX-"% 
index %" TEXT="% text %"> 

VOICE tag with TYPE set to "Index" indicates an 
indexed file containing pre-recorded voice phrases. It's 
attributes are SRC, INDEX and TEXT. SRC and TEXT have 
same meaning as in Voice Files. The INDEX attribute 
specifies index of the phrase within the file either as a 
number or a label. 

For example: 

<VOICE TYPE-"File" SRC-"myweb/homc/ 
greeting.wav"> 
Text-to-Speech 

<VOICE TYPE-'Text" TEXT-"% text %"> VOICE tag 
with TYPE set to "Text" indicates a text-to-speech string. 
It's attribute is TEXT which specifies the string that needs 
to be translated to speech. 

For example: 

<VOICE TYPE-'Text" TEXT-" Welcome to your Home 
Page"> 

Voice Streams: 

<VOICE TYPE-"Stream" VALUE-"% URL %" 
TERMINATE-"% tone %"> 

VOICE tag with TYPE set to "Stream" indicates a con- 
tinuous voice stream identified by its URL. The browser 
accesses the voice stream and continuously plays it to the 
user. It's attribute is TERMINATE which specifies the tone 
the user can enter to terminate the playback. 
Currency 

<VOICE TYPE-" Money" VALUE-"% number %" 
FORMAT-"% format %"> 

VOICE tag with TYPE set to "Money" indicates a number 
that needs to be presented as currency. It's attributes are 
VALUE and FORMAT. VALUE specifies the decimal value 
of the number and FORMAT, which is optional, specifies the 
currency type such as "US Dollar", "British Pound" etc. The 
default value for FORMAT is "US Dollar". 
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Numbers 

<VOICE TYPE-"Number" VALUE-"% number %" 
FORMAT-" % format %"> 

VOICE tag with TYPE set to "Number" indicates a 
□umber that needs to be presented as a decimal number. It's 
attributes are VALUE and FORMAT. VALUE specifies the 
decimal value and FORMAT, which is optional, specifies the 
precision to be conveyed. Digits after the decimal point are 
pronounced as characters. Default value for the FORMAT is 
2 which indicates 2 digit precision after decimal point. 
Characters 

<VOICE TYPE -"Character" VALUE="% string %> 
VOICE tag with TYPE set to "Character" indicates a 
sequence of characters that are to be presented separately 
with no pauses in between. It's attribute is VALUE which 
specifies the sequence of characters as string. 
Dates 

<VOICE TYPE="Date" VALUE-"% dale %" 
FOR MAT-*' % formal %"> 

VOICE tag with TYPE set to "Date" indicates an expres- 
sion that is to be presented as a date. It's attributes are 
VALUE and FORMAT. VALUE attribute specifies the 
expression and the FORMAT attribute, which is optional, 
specifies the format of the expression. Default format is 
MM/DD/YY. 
Ordinals 

<VOICE TYPE-"Ordinal" VALUE-"% number %"> 
VOICE tag with TYPE set to "Ordinal" indicates a 

number that is to be presented as an ordinal (i.e. as Nth 

value). It's attribute is VALUE which specifies the number. 

Values are pronounced as "first", "second", "third" etc. 

Strings: 

<VOICESTRING NAME-"% name %"> 
. . . Voice Components . . . 
</VOICESTRING> 

VOICESTRING tag indicates a sequence of voice com- 
ponents that are grouped together for presentation without 
any pauses in between. Each of the voice components can be 
any of the primitives previously defined. The voice browser 
gathers the individual components and plays them together 
in sequence. 

<VoiceString NAME-"welcome"> 

<\foice TYPE«"Index" SRC-"welcome.vap" INDEX- 
"begin" TEXT="Wclcomc"> 

< Voice TYPE-"File" SRC-"username.vox" TEXT- 
"user's name"> 

<Vbice TYPE-"Index" SRC-"welcome.vap" INDEX- 
"end" TEXT-"to VOIS NET' 

</VoiceString> 

The voice browser "plays" each in-line voice component 
in sequence as it encounters it in the HVML page starting 
from the beginning of the page. Each voice component is 
played only once for each presentation. A "reload" com- 
mand would cause the voice browser to re-play the page. 

Of course, voice elements can also be invoked by hyper 
links pointing to voice files containing digitized voice data. 
This is similar to existing HTML conventions. The voice 
browser simply fetches the new page and plays it once. In 
the next section, we will discuss how hyperlinks can be 
invoked using touch tone or key word input. 
Voice responsive labels for hyper-links 

In order to invoke hyper links embedded in a HVML 
page, two new attributes "TONE" and "LABEL" are added 
to the anchor element. These attributes are used in conjunc- 
tion with the existing HREF attribute in an anchor element 
that makes the anchor into a hyper link. When the user 
selects the touch tone signals specified by the value of the 
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TONE attribute followed by the "#" tone or utters the word 
specified by the LABEL attribute, the browser invokes the 
corresponding hyper link. The TONE and LABEL attribute 
values must be unique within a page. 
5 For example: 

<A HREF-"myweb/home/greeting.vml TONE- 
w HELLO"> 
or 

<A HREF-"myweb/home/greeting.vml LABEL- 
1Q HELLO"> 

When the user presses "H, E, L, L, O, #" on the touch tone 
phone or the user says the word "HELLO" on the phone, the 
browser will invoke the corresponding hyper link and 
accesses the "greeting.vml" page. 
Keyword accessible indexes for anchors 

15 HTML allows the index access of fragments within a page 
by unique labels associated with anchors surrounding the 
fragment. The NAME attribute in an anchor element speci- 
fies a label that is unique within the page. This label can then 
be used as an index by the browser to search for the fragment 

20 by matching the unique label with the one supplied in the 
hyperlink. The hyperlink for the indexed fragment uses the 
regular URL for the age concatenated with the fragment's 
unique label with a "#" separator. 

Coupled with voice responsive hyper links, fragment 

25 labels can be used to construct simple menus or database 
searches. 
For example: 

Suppose "myweb/home/prompLs.vml" contains the fol- 
lowing HVML text. 
30 <ANAME="promptl"> 

< VOICE TEXT-" Press CAL# for Calendar*^ 
</A> 

<A NAME-"prompt2"> 

<VOICE TEXT-" Press ADDR# for Address Book"> 
35 </A> 

<A NAME-" pro mpt3"> 

< VOICE TEXT-"Press EMAIL for Electronic Mail"> 
</A> 

Suppose another HVML page contains the following 
40 hyperlinks. 

<A HREF-"myweb/home/prompts.vml#promptl" 
TONE-"l">Press 1 to hear Promptl</A> 

<A HREF-"myweb/home/prompts.vml#prompt2" 
TONE-"2">Prcss 2 to hear Prompt2</A> 
45 <A HREF-"myweb/home/prompts.vml#prompi3" 
TONE-"3">Press 3 to hear Prompt3</A> 

Then, if the user presses "1, the browser will fetch the 
"myweb/home/prompts.vml" HVML page, match 
"prompt 1" index with the first anchor's "prompt 1" label, 
50 and start presenting the prompts starting with text-to-speech 
translation of "Press CAL# for Calendar". 
Browser Control 

<PAUSE TIMEOUT-"% seconds %" TERMINATE-"^ 
tone %"> 

55 In order to let the voice page publisher to control the 
behavior of the voice browser, HVML defines a tag "Pause" 
with "TIMEOUT" and "TERMINATE" attributes. When the 
browser encounters a PAUSE statement, it pauses until 
either the amount of lime specified in the TIMEOUT 

60 attribute elapses or the user enters the tone specified in the 
"TERMINATE" attribute. If the values of the TIMEOUT 
attribute is 0, then the browser waits there indefinitely. The 
default value for TIMEOUT is 1 second. Default value for 
TERMINATE is 

65 Vbice Responsive Forms 

HVML uses the FORM tag to enable user input similar to 
HTML including the METHOD attribute which specifies the 
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way parameters are passed to the server and the ACTION 
attribute which specifies the procedure to be invoked by the 
server to process the form. HVML extends the INPUT tag 
within forms by introducing VOICEINPUT lag. VOICE1N- 
PUT takes a TYPE attribute similar to the INPUT tag with 
three new values "voice", "tone" and "review" in addition to 
the existing "reset" and "submit" values. The HVML 
browser pauses at each VOICEINPUT statement in a HVML 
form until the specified input is supplied or input is termi- 
nated before processing the remaining form. 

The VOICEINPUT tag with TYPE value set to "voice" 
indicates a form that accepts voice input. Usually, a voice 
prompt or text-lo-speech segment precedes the VOICEIN- 
PUT tag alerting the user that input is required and how to 
terminate input. The user is expected to speak and this 
message is recorded in real-time and supplied to the Voice 
Web server for processing. The VOICEINPUT tag contain- 
ing "voice" value for the TYPE attribute also supports a 
MAXTIME attribute which specifies the maximum record- 
ing time for the message and a TERMINATE attribute which 
specifies the touch tone that terminates input. If the MAX- 
TIME attribute is not specified, then the default value of 
"15" is assumed. If TERMINATE attribute is not specified, 
then the default value of "#" is assumed. For example, if the 
MAXTIME value is 20 and TERMINATE value is "#", then 
recording terminates when the user presses "#" or 20 sec- 
onds of time elapses. 

The VOICEINPUT lag with TYPE value set to "tone" 
indicates a form that accepts louch tone input. Again, a voice 
prompt or a text-to-speech segment precedes the VOICE- 
INPUT tag alerting the user for input. The user is expected 
to press a sequence of touch tones which are recorded and 
supplied to the Voice Web server for processing. The VOI- 
CEINPUT tag containing "tone" value for the TYPE 
attribute also supports a MAXDIGITS attribute which speci- 
fies the maximum number of touch tone digits that can be 
supplied and a TERMINATE attribute which specifies the 
touch tone that terminates input. If the MAXDIGITS 
attribute is not specified, then the default value of "20" is 
assumed. If TERMINATE attribute is not specified, then the 
default value of is assumed. For example, if the MAX- 
DIGrTS value is 10 and TERMINATE value is "#", then 
input process terminates when the user presses "#" or 10 
digits arc supplied. 

The VOICEINPUT tag with TYPE value set to "review" 
indicates that the current values of the form can be reviewed 
by selecting the "review" input. The VOICEINPUT tag with 
TYPE value set to "reset" indicates that the current values of 
the form should be reset to their original defaults. The 
VOICEINPUT tag with TYPE value set to "submit" indi- 
cates that the current form should be submitted to the server. 
Each of these three TYPE values support a SELEC1TONES 
attribute and a SKIPTONES altribute. SELECTTONES 
attribute specifies the sequence of louch tones that activates 
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attribute value is "**", the user can either enter "DONE" to 
submit the form or press "**" lo skip the selection. VOI- 
CEINPUT tag with TYPE value set to "reset" similarly 
indicates that the values of the form be reset to their original 
values. 

II. Voice Browser Commands 

All browser commands must start with the "*" key. Each 
browser command is associated with one or more key words 
that uniquely identify it. For example, in order to activate 
"Home" command, the user would press "*home" on the 
telephone key pad. The key words are chosen in such a way 
to generate unique dial tone sequences. A set of default 
browser commands are listed below with the keyword and 
description of the command. Alternatively, the browser 
commands can also be issued by vocalizing the correspond- 
ing commands. For example, to activate the "Home" 
command, the user would say "home" on the telephone. 
Previous 

Jump to the previous page from which the current page 
was accessed via a hyper link. This command is activated by 
pressing "*pr" (*77) or "*prev" (*7738) sequence. 
Next 

Jump to the next page in a sequence of hyper links. This 
command is activated by pressing "*n" (*6) or "next" 
(*6398) sequence. 
History 

Present the titles of the pages accessed so far in the order 
of their hyper link access sequence. Pause after each title. If 
the user presses "#", then jump to the page specified by the 
title. If not, proceed to the next title. This command is 
activated by pressing "*hi" (*44) or "*bist" (4478) 
sequence. 
Home 

Jump to the first page in the sequence of hyper links. This 
command is activated by pressing "*ho" (*46) or "*bome" 
(*4663) sequence. 
Reload 

Reload the current page again from the Web server. This 
command is activated by pressing "*re" (*73) or "*relo" 
•(7356) sequence. 
Help 

Jump to the home page of the help page set. Help pages 
are navigated in exactly the same way as ordinary HVML 
pages. However, a new browser instance is created on 
activation which must be "exited" to get back to the page 
context from which "Help" page set was accessed. This 
command is activated by pressing "*h" (*4) or "*help" 
(M357) sequence. 
Fax 

Jump lo the home page of the Fax dialog session using 
HTML forms. Again, a new browser instance is created on 
activation which must be "exited" to gel back to the page 



the corresponding selection. SKIPTONES attribute specifies 55 context from which "Fax" dialog session was activated. This 

e selection. If the command is activated by pressing "* fa" (*32)"*f 



the sequence of touch tones that skips the 
SELECTTONES attribute is not specified, then the default 
value of "#" is assumed and if the SKIPTONES attribute is 
not specified, then the default value of "*" is assumed. 

For example, if the SELECTTONES attribute value is 
"REVIEW" and SKIPTONES attribute value is "SKIP" for 
a VOICEINPUT element with TYPE value set to "review", 
the user can enter "REVIEW* to review the form values or 
enter "SKIP* to skip the selection. VOICEINPUT tag with 
TYPE value set lo "submit" similarly indicates the values of 65 
the form can be submitted to the server. If the SELECT- 
TONES attribute value is "DONE" and the SKIPTONES 



fax"(*329) 

sequence. 
Stop 

Slop loading the page that is currently being accessed. 
60 This command is activated by pressing "*t" (*8) or "*stop" 
(♦7867) sequence. 
Exit 

Exit the current instance of the browser and return to the 
page being accessed in the previous instance of the browser. 
If this is the first instance of the browser, then exit the 
browser and hang-up the phone. This command is activated 
by pressing "*x" (*9) or "* exit" (*3948) sequence. 
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Bookmarks 

Present the titles of the pages selected as bookmarks in the 
order of their hyper link access sequence. Pause after each 
title. If the user presses "#", then jump to the page specified 
by the title. If not, proceed to (he next title. This command 
is activated by pressing "*bo" (*26) or "*book" (*2665) 
sequence. 

III. Voice Browser Playback Controls 

When the Voice browser is activated to play back voice 
prompts or speech segments, an additional set of browser 
commands are available to the user to control the playback. 
Pause 

Pause the play back at current position. This command is 
activated by pressing "*p" (*7) or "*pause" (* 72873). 
Play 

Continue play back from current position. This command 
is activated by pressing "*p" (*7) or "*play" (*7529). 
Backup 

Back up the play back position by 5 seconds and start play 
back. The command is activated by pressing "*b" (*2) or 
"*back" (*2225). Repeated pressing of the same tone 
implies successive back up by 5 seconds for each tone. 
Forward 

Forward the play back position by 5 seconds and start play 
back. The command is activated by pressing "*f* (*3) or 
"*frwd" (*3793). Repeated pressing of the same tone 
implies successive skip forward by 5 seconds for each tone. 
Start 

Back up the play back position to the beginning of the 
play back sequence and start play back. The command is 
activated by pressing "*0". 
End 

Jump to the end of the play back sequence, backup by 5 
seconds and start play back. The command is activated by 
pressing 

What is claimed is: 

1. A method for delivering caller-customized services to a 
telephone caller, comprising: 

storing caller-specific information in a computer file on a 
computer network in accordance with a universal 
resource locator (URL) address wherein the stored 
caller-specific information includes a master voice sig- 
nature for the caller; 

prompting the caller to input identifying information; 

responsive to the identifying information, determining a 
URL for the file storing the caller-specific information; 

retrieving the caller-specific information from the file 
stored at the URL; and 

accessing information in a voice web in accordance with 
the caller-specific information wherein accessing infor- 
mation in a voice web in accordance with the caller- 
specific information comprises: 
prompting the caller for a voice signature, 
recording the voice signature, and 
comparing the voice signature to the recorded voice 
signature to determine whether there is a match. 

2. A method for delivering caller-customized services to a 
telephone caller, comprising: 

storing caller-specific information in a computer file on a 
computer network in accordance with a universal 
resource locator (URL) address wherein the stored 
caller-specific information includes a speaker depen- 
dent speech recognition training file for the caller; 
prompting the caller to input identifying information; 
responsive to the identifying information, determining 
a URL for the file storing the caller-specific infor- 
mation; 
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retrieving the caller-specific information from the file 

stored at the URL; and 
accessing information in a voice web in accordance 
with the caller-specific information wherein access- 
ing information in a voice web in accordance with 
the caller-specific information comprises: 
prompting the caller for voice commands, 
recording the voice commands, and 
performing speaker dependent speech recognition on 
the voice commands using the training file for the 
caller. 

3. In a computer system coupled to a computer network, 
wherein the computer network is the Internet, a method of 
providing user specific input to a computer program, com- 

15 prising: 

determining a universal resource locator (URL) address 

corresponding to a user; 
retrieving, over the computer network, a personal profile 
associated with the user wherein the personal profile 
includes data for voice authentication and is stored at 
the determined URL address; 
accessing information included in the persooal profile to 
affect the execution of a computer program for navi- 
gating and accessing information in a voice web; 
receiving a user authentication request; 
retrieving user authentication data from the personal 
profile; 

collecting voice data from the user, 
processing the collected voice data; and 
comparing the processed voice data to the authentication 
data to authenticate the identity of the user. 

4. The method of claim 3 wherein collecting voice data 
from the user includes collecting voice data from the user 
using a telephone. 

5. In a computer system coupled to a computer network, 
wherein the computer network is the Internet, a method of 
providing user specific input to a computer program, com- 
prising: 

determining a universal resource locator (URL) address 

corresponding to a user; 
retrieving, over the computer network, a personal profile 
associated with the user wherein the personal profile 
includes data for speaker dependent speech recognition 
and is stored at the determined URL address; 
accessing information included in the personal profile to 
affect the execution of a computer program for navi- 
gating and accessing information in a voice web; 
receiving a voice command from the user; 
performing speaker dependent speech recognition to iden- 
tify the voice command; and 
executing the recognized voice command. 

6. The method of claim 5 wherein receiving a voice 
command from the user includes receiving a voice command 
from the user using a telephone. 

7. A speech processing system, comprising: 
a computer network; 

a gateway computer coupled to the computer network 

adapted to receive subscriber commands; 
a server computer program coupled to the network; 
a user profile stored on the computer network; 
voice web pages stored on the computer network wherein 
each voice web page is addressable by a universal 
resource locator (URL) address unique within the com- 
puter network and wherein each voice web page 
includes voice information; and 



20 



25 



30 



35 



45 



50 



55 



60 



65 



04/13/2004, EAST version: 1.4.1 



US 6,4( 

29 

speech processing software adapted to operate in the 

computer network for 
receiving a user identifier, 
receiving a command, 

determining a URL address associated with a voice web 

page responsive to the command, 
determining a URL address associated with the user 

profile responsive to the user identifier, 

retrieving the user profile, 

retrieving the voice web page, and 

generating an output responsive to the user command and 

information included in the retrieved voice web page 

and the user profile. 

8. The system of claim 7 wherein the computer network 
is an internet. 

9. The system of claim 8 wherein the user profile includes 
voice print information and wherein the command received 
by the speech processing software is a command to authen- 
ticate the identity of a user. 

10. The system of claim 8 wherein the user profile 
includes speech training information and wherein the com- 
mand received by the speech processing software is a 
digitized version of a spoken command and wherein the 
digitized version is processed using retrieved speech training 
informalion. 

11. The system of claim 8 further comprising a database 
query form customized in accordance with at least a portion 
of the user profile. 

12. The system of claim 8 wherein the speech processing 
software is further adapted to perform the method compris- 
ing: 

searching a database to return a query result; and 
presenting the query result responsive to at least a portion 
of the user profile. 

13. The system of claim 8 further comprising a user 
directory having a plurality of entries, each entry corre- 
sponding to a user identifier and each entry being mapped to 
a URL address and wherein determining the URL address 
associated with a voice document responsive to the user 
identifier and the command includes retrieving a URL 
address from the user directory. 

14. The system of claim 8 wherein the computer network 
is an internet. 

15. The system of claim 14 wherein the user profile 
includes voice signature information and wherein the com- 
mand received by the speech processing software is a 
command to authenticate the identity of a user. 

16. The system of claim 14 wherein the user profile 
includes speech training information and wherein the com- 
mand received by the speech processing software is a 
digitized version of a spoken command and wherein the 
digitized version is processed using retrieved speech training 
information. 

17. The system of claim 14 further comprising a database 
query form customized in accordance with at least a portion 
of the user profile. 

18. The system of claim 14 wherein the speech processing 
software is further adapted to perform the method compris- 
ing: 

searching a database to return a query result; and 
presenting the query result responsive to at least a portion 
of the user profile. 

19. The system of claim 14 further comprising a user 
directory having a plurality of entries, each entry corre- 
sponding to a user identifier and each entry being mapped to 
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a URL address and wherein determining the URL address 
associated with a voice document responsive to the user 
identifier and the command includes retrieving a URL 
address from the user directory. 
5 20. A personal voice web for a subscriber comprising: 
a plurality of linked voice web pages, each page including 
an agent for performing various processing tasks 
required for each respective page and a specially tagged 
set of key words and touch tone sequences that are 
10 associated with embedded anchors and links used for 
navigation within the web; 
each voice web page having access to a respective speech 
training profiles web page, the speech training profiles 
web page comprising subscriber specific profiles, the 
15 profiles including component sets of related words 
likely to occur in combination within the respective 
voice web page, and each voice web page having 
access to an attributes and preferences web page having 
access to subscriber specific attributes and preferences 
20 specific to the respective voice web page; and 

said plurality of linked voice web pages including a 

personal profile page and service pages. 
21. The personal voice web of claim 20 wherein each of 
said service pages references a service agent communica- 
tively coupled to a service profile agent, wherein the service 
agent, in response to a subscriber's request for its corre- 
sponding service, retrieves information from a service 
database, retrieves from the service profile agent subscriber 
3Q specific, service specific, speech training profiles and sub- 
scriber specific, service specific attributes and preferences, 
and customizes voice web pages using the retrieved speech 
training profiles and attributes and preferences for presen- 
tation to the subscriber. 
3J 22. The personal voice web of claim 21 wherein the 
service pages include a calendar service page. 

23. The personal voice web of claim 21 wherein the 
service pages include an address book service page. 

24. The personal voice web of claim 21 wherein the 
^ service pages include an electronic mail service page. 

25. The personal voice web of claim 21 wherein the 
service pages include a slock portfolio service page. 

26. The personal voice web of claim 21 wherein Ihe 
service pages include a news headlines service page. 

27. In a personal voice web comprising a plurality of 
linked voice web pages including a personal profile page and 
service pages, each voice web page having access to a 
respective speech training profiles web page, the speech 
training profiles web page comprising subscriber specific 

5Q profiles, the profiles including component sets of related 
5 words likely to occur in combination within the respective 
voice web page, and each voice web page having access to 
an attributes and preferences web page having access to 
subscriber specific attributes and preferences specific to the 
J5 respective voice web page, a method for providing custom- 
ized interaction with a subscriber in response to a request for 
a service from the subscriber, the method comprising: 
responsive to the request for the service, retrieving infor- 
mation from a service database comprising informalion 
of Ihe requested service; 
retrieving subscriber specific speech training profiles and 
subscriber specific attributes and preferences appli- 
cable to the requested service; and 
customizing voice web pages in accordance with the 
65 subscriber specific speech training profiles and 
attributes and preferences applicable to the requested 
service for presentation to the subscriber. 
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28. The method of claim 27 wherein the service database 
includes a calendar service database. 

29. The method of claim 27 wherein the service database 
includes an address book service database. 

30. The method of claim 27 wherein the service database 
includes an electronic mail service database. 
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31. The method of claim 27 wherein the service database 
includes a stock portfolio service database. 

32. The method of claim 27 wherein the service database 
includes a news headlines service database. 
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